Skip to main content

View Post [edit]

Poster: randomo Date: Oct 12, 2018 5:54am
Forum: texts Subject: Recurrent typos in Lord of the Rings

I've just set up an account here; I hope I'm posting this in the right place (and I apologize if the answer I'm seeking is readily findable somewhere on the site).

I was looking through the Archive's full-text file for The Lord of the Rings: The Return of the King (https://archive.org/stream/TheLordOfTheRing1TheFellowshipOfTheRing/The%20Return%20Of%20The%20King_djvu.txt) and noticed a recurring typo. In many places where there should have been an E, there was a J instead. For example, the character named Eowyn was repeatedly shown as Jowyn.

How can I report this to someone who can fix the text file? Thanks.

Reply [edit]

Poster: christ_chan64 Date: Oct 23, 2018 5:14am
Forum: texts Subject: Re: Recurrent typos in Lord of the Rings

Not an Archive employee, but the djvu.txt file is a derivative file.

Basically, what happens is that when a PDF is uploaded, page images are extracted from the PDF, and then those images are read by an optical character recognition (OCR) software called ABBYY, and the readout is saved as a file named filename_abbyy.gz. Every other format is ultimately derived from this OCR readout, including the _djvu.txt file.

Due to the imperfect nature of OCR technology, the readout is rarely 100% accurate and errors will creep in.

I doubt Archive employees have any time to be correcting errors in OCR-ed text. If it was your item, I would suggest perhaps downloading the text file, correcting the errors and reuploading it with the same filename. (Not sure if the Archive software would overwrite the changes, though.).

Alternatively, you could make the corrections yourself and upload it as a separate item.